-
Notifications
You must be signed in to change notification settings - Fork 0
Support for multiple LoRAs with multiple models via BBR using convent… #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… /pkg/epp/datalayer/factory.go:28:2: no required module provides package sigs.k8s.io/gateway-api-inference-extension/api/v1
One liner: add COPY api ./api command to resolve import dependency in…
…ional name for LoRA: model-name/lora/lora-name
logger.V(logutil.DEFAULT).Info("Orig: " + string(orig)) | ||
|
||
if idx := bytes.Index(orig, []byte(loraTag)); idx != -1 { | ||
lastSlash := bytes.LastIndex(orig[:idx], []byte("/")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this assuming adapter name doesn't contain /
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I take everything before loraTag
as prefix
and everything after loraTag
as suffix
.
// skip the slash itself by adding +1 so suffix doesn't start with '/' | ||
suffix = afterTag[nextSlash+1:] | ||
logger.V(logutil.DEFAULT).Info("Model name after mutation:" + string(suffix)) | ||
requestBody.Model = string(suffix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should update the content-length header when changing anything in the content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review comments provided
hub: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension | ||
tag: main | ||
pullPolicy: Always | ||
#tag: main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments
- while fine for local development, these should not be part of the PR. The charts should continue to work with main.
- also for local dev, you want
Always
(even when running on kind), since you want newly built images to download to the workers (from Kind, not docker)
backendRefs: | ||
- name: deepseek-r1 | ||
group: inference.networking.x-k8s.io | ||
kind: InferencePool No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing newline at EoF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what YAMLs are used by the current guides? Those are not checked in so you created a new one?
If so, might want to refer to the same file in the multi-model guide.
Same applies to other YAMLs in this PR
"bytes" | ||
"context" | ||
"encoding/json" | ||
"os" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a IGW package for interacting with env, including setting of defaults, casting to types, etc.
return nil, err | ||
} | ||
|
||
//The reason for this additional unmarshal is that I change the model name and then re-marshal RequestBody struct. But it has only one field, and I need to preserve original message at re-marshalling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing space after //
in comments.
Applies to multiple places.
|
||
//The reason for this additional unmarshal is that I change the model name and then re-marshal RequestBody struct. But it has only one field, and I need to preserve original message at re-marshalling. | ||
//This can be done more efficiently if a full "official" struct by OpenAI is used. In OpenAI v2 it should be ChatCompletionNewParams | ||
var raw map[string]json.RawMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could unmarshal from/to map[string]any
and not lose the info.
Also, if there's a standard (or commonly used) OpenAI parsing package, suggest PR converting BBR to using that instead of custom code.
//Convention: [model-family]/<model-name>/lora/<lora-name> | ||
//Model name definition (the vLLM side) does not change: <my-arbitrary-lora-name> | ||
//Model name in request (the client side): lora-name (no change from before) | ||
loraTag := os.Getenv("LORA_TAG") //set via environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be preferred, IMO, to make that change in a separate function and not mutate the current implementation.
also, using a callback would be more aligned with the expectations of readers given the structure and use of plugins in EPP.
loraTag = "lora" | ||
} | ||
|
||
orig := []byte(requestBody.Model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be easier to parse using strings.Split
by casting byte array to string.
suffix = afterTag[nextSlash+1:] | ||
logger.V(logutil.DEFAULT).Info("Model name after mutation:" + string(suffix)) | ||
requestBody.Model = string(suffix) | ||
// update only the "model" field in the original raw map so other fields (e.g. prompt) are preserved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain the convention in the PR message?
I think you're expecting model to be in the format of <some-base-model-to-dispatch-to> <tag (e.g., /lora)> <model-name-as-known-by-vllm>
.
Is that correct?
requestBody.Model = string(suffix) | ||
// update only the "model" field in the original raw map so other fields (e.g. prompt) are preserved | ||
modelBytes, merr := json.Marshal(requestBody.Model) | ||
if merr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not reuse err instead of merr and merr2?
//Mutate model name if it contains reserved keyword "lora" indicating that the requested model is lora (served from the same vLLM as the base model and from the same inferencepool) | ||
//Convention: [model-family]/<model-name>/lora/<lora-name> | ||
//Model name definition (the vLLM side) does not change: <my-arbitrary-lora-name> | ||
//Model name in request (the client side): lora-name (no change from before) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is this comment (line 84) accurate ? This is saying the client side has no change from before but I think in this feature you are proposing the client prompt has to have the model name in the format you listed in line 82 which will be a change for most clients I believe.
- Also it seems that clients will be required to know the exact backend LORA name which they may not usually know.
309637b
to
dd77248
Compare
…ional name for LoRA: model-name/lora/lora-name